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Abstract 

In Trypanosoma cruzi the isoprenoid and sterol blosyntliesis patliways are validated targets for chemotherapeutic 
intervention. In this work we present a study of the genetic diversity observed in genes from these pathways. Using a 
number of bioinformatic strategies, we first identified genes that were missing and/or were truncated in the T. cruzi 
genome. Based on this analysis we obtained the complete sequence of the ortholog of the yeast ERG26 gene and identified 
a non-orthologous homolog of the yeast ERG25 gene (sterol methyl oxidase, SMO), and we propose that the orthologs of 
ERG25 have been lost in trypanosomes (but not in Leishmanias). Next, starting from a set of 16 7. cruzi strains representative 
of all extant evolutionary lineages, we amplified and sequenced ~24 Kbp from 22 genes, identifying a total of 975 SNPs or 
fixed differences, of which 28% represent non-synonymous changes. We observed genes with a density of substitutions 
ranging from those close to the average {~2.5/l 00 bp) to some showing a high number of changes (1 1 .4/1 00 bp, for the 
putative lathosterol oxidase gene). All the genes of the pathway are under apparent purifying selection, but genes coding 
for the sterol C14-demethylase, the HMG-CoA synthase, and the HMG-CoA reductase have the lowest density of missense 
SNPs in the panel. Other genes (TcPMK, TcSMO-like) have a relatively high density of non-synonymous SNPs (2.5 and 1.9 
every 100 bp, respectively). However, none of the non-synonymous changes identified affect a catalytic or ligand binding 
site residue. A comparative analysis of the corresponding genes from African trypanosomes and Leislimania shows similar 
levels of apparent selection for each gene. This information will be essential for future drug development studies focused on 
this pathway. 
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Introduction 

Trypanosoma cruzi, a protozoan parasite of the order Kinetoplas- 
tida, is the causative agent of Chagas Disease, a neglected disease 
tliat is endemic in South America, afTecting in excess of 8 million 
people [1]. The currently available drugs used to treat Chagas 
Disease (Nifurtrmox, Benznidazole) have several drawbacks 
including toxicity, and the fact that they are mostly effective 
during the acute phase of the infection. 

The T. cruzi species has a structured population, with a 
predominantly clonal mode of reproduction, with infrequent 
genetic exchange [2,3] . Through the use of a number of genetic 
markers the population has been divided into six evolutionary 
lineages, also called Discrete Typing Units (DTUs) [4,5], now 
designated as Tcl to TcVI [6] . Lineages TcV and TcVI (this latter 
lineage includes the strain used for the frrst genomic sequence of T. 
cruzi, CL Brener) have a very high degree of heterozygosis. The 
currently favoured hypothesis suggests that these two lineages 
originated after one or two ancestral hybridization events [7-9]. 
There are no general agreement regarding the estimated time of 
divergence of the six T. cruzi lineages; however recent estimations 



suggest that major lineages (excluding the hybrid lineages TcV and 
TcVI) diverged 1-4 millions of years ago [9,10] while the hybrid 
lineages emerged much more recently (less than 1 million years 
ago, according to Flores-Lopez and Machado [9], and within the 
last 60,000 years, according to Lewis MD, et at. [10]). This 
divergence was accompanied with a diversification of phenotypic 
and biological properties. Indeed, several investigations suggest 
that the observed diversity in host preference, cell tropism and 
drug susceptibility might be properties of different strains and/or 
lineages [11-14]. 

The parasite ergosterol biosynthesis pathway is one of the major 
routes for chemotherapeutic intervention against T. cruzi. Humans 
and trypanosomes share many of the enzymes leading to essential 
isoprenoid and sterol precursors. However, the mevalonate 
pathway is more related to that found in bacteria and archaea 
[15], and the ergosterol pathway is essentially similar to that in 
fungi, with a number of key steps that differ with the cholesterol 
biosynthesis pathway of mammals. 

Inhibitors that block sterol biosynthesis or the biosynthesis of 
isoprenoid precursors inhibit growth of the parasite and cause 
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severe morphological defects [16,17]. Triazole derivatives that 
inhibit the parasite C14-0( sterol demethylase are the most 
promising compounds, with proved curative activity in murine 
models of acute and chronic Chagas disease [18-22]. And one of 
them (posaconazole) is undergoing a number of clinical trials. 
Other ergosterol biosynthesis inhibitors with good potency in vitro 
or in vivo, or with good activity in an enzymatic assasy include 
those that target 3-hydroxy-3-methyl-glutaryl-CoA reductase [23- 
25], farnesyl diphosphate syntetase [26], squalene synthase[27,28], 
squalene epoxidase/monooxygenase [29,30], lanosterol synthase/ 
oxidosqualene cyclase [31-33], and 24-C sterol methyl transferase 
[34-36], as well as compounds with dual mechanisms of action 
(ergosterol biosynthesis inhibition and free radical generation, 
reviewed in [16,37]). 

The azoles, like the triazoles, are used extensively for the 
treatment of fungal infections with excellent results, though 
different resistant strains have appeared over time in difiFerent 
species. One of the main resistance mechanisms observed is based 
on a diminished affinity of the target enzyme for the compound; 
which is caused by specific mutations in the ERG 1 1 / CYP5 1 gene. 
Dilferent point mutations were identified in several fungi species as 
responsible for this azoic resistance (reviewed in [38,39]). 

In this paper we analyze the genetic diversity present in the 
ergosterol biosynthesis pathway of T. cruzi and describe the 
apparent genetic selective pressure on these genes. 

Results 

Filling pathway holes: genes involved in sterol 
biosynthesis in T. cruzi 

To analyze the genetic div(;rsity of the T. cruzi sterol biosynthesis 
pathway (SBP) we decided to sequence all enzymes of the 
pathway, starting from enzymes that produce the terpenoid 
backbone precursors, and going down to the last enzyme that 
produces ergosterol as a product. Therefore, as a first step we 
looked for T. cruzi genes that were mapped to the corresponding 
KEGG metabolic pathway maps [40]. SBP genes in KEGG are 
classified in two maps: the steroid biosynthesis pathway map 
(TCROO 1 00, www.genome.jp/kegg/ pathway/ tcr/tcrOO 1 OO.html, 
and the terpenoid backbone biosynthesis pathway map 
(TCR00900, www.genome.jp/kegg/pathway/tcr/tcr00900.html). 
These maps contain information derived from the T. cruzi CL- 
Brener reference genome. From this analysis we were able to 
identify 15 genes mapped to these pathways. However, we also 
detected a number of holes in the pathway: enzymatic reactions 
with no enzyme mapped, and cas(-s in ^\hi(h the enzymes 
available in KEGG were truncated (probably because of genome 
assembly problems). Therefore, before attempting to amplify and 
sequence the corresponding genes, we invested some effort in 
analyzing the existing sequence data to obtain a relevant 
complement of genes. As mentioned, in one case the correspond- 
ing genes from the reference genome were truncated, probably 
because of genome assembly problems. This was the case of the 
isopentenyl-diphosphate delta-isomerase gene (TcIDIl), which 
was cloned and sequenced by Dr. TK Smith (unpublished, 
GenBank accession number: AJ866772, 1071 bp). The corre- 
sponding genes in KEGG, mapped from the CL-Brener genome 
(Esmeraldo-like and non-Esmeraldo-like alleles) were both shorter, 
at 537 bp (TcCLB.408799.19), and 540 bp (TcCLB.5 1043 1.10). 
Therefore for this work we used the full-length TcIDIl sequence 
obtained from GenBank (AJ866772). This sequence produces an 
active enzyme when expressed in an heterologous system (Dr. TK 
Smith, personal communication). 



To fill in other identified gaps, we used the Saccharomyces cmvisiae 
sterol biosynthetic pathway as a reference model (Table 1). The 
yeast SBP has been studied extensively, and is essentially complete 
in pathway databases. Using the yeast genes from these pathways, 
we looked for orthologs in T. cruzi by doing sequence similarity 
searches (BLAST) against the complete T. cruzi genome or using 
databases of orthologs compiled from complete genome data, such 
as the OrthoMCL database [41]. As a result of this strategy, we 
were able to map five additional genes (see Table 2), which are the 
orthologs of the S. cmvisiae genes: ERG 13 (3-hydroxy-3-methyl- 
glutaryl-CoA (HMG-CoA) synthase), ERG7 (lanosterol synthase), 
ERG26 (C-3 sterol dehydrogenase), ERG3, (C-5 sterol desatur- 
ase), and ERG5 (C-22 sterol desaturase). These genes were present 
in the T. cruzi genome, but were not mapped to the corresponding 
metabolic maps in KEGG. Some of these enzymes were already 
characterized biochemically in T. cruzi [25,32]. In all these cases 
except for one (ERG26), the identification of the corresponding 
ortholog did not present further complications. The putative T. 
cruzi ortiiolog of tiie yeast ERG26 gene (TcCLB.510873.10, C3- 
sterol dehydrogenase) was found in the T. cruzi genome database 
as a truncated gene of 675 bp (224 aa). This may explain why it 
was not mapped to the corresponding pathway map in KEGG. 
Reasoning that this might be a consequence of assembly problems, 
as in the case of the TcIDIl gene, we set out to identify the missing 
portions of this gene by performing BLASTN searches against a 
database of unassembled genomic reads (GSS or WGS reads) from 
the CL-Brener genome project (data provided originally by the 
TIGR-SBRI-KI sequencing consortium). Starting with the trun- 
cated TcCLB. 510873. 10 sequence as query, we identified a 
number of matching sequence reads (BLASTN, E-value < 
10£"*°). These sequences were then assembled into a single 
contiguous sequence, which was used as a query in successive 
rounds of BLASTN searches, followed by reassembly. This 
iteration cycle was repeated until we recovered the complete 
(full-length) sc'fjuc'nce of the T. cruzi putative C3-sterol dehydro- 
genase gene, as judged by its alignment against the yeast ERG26 
gene. This reconstructed full-length serjuence of the T. cruzi C-3 
sterol dehydrogenase gene has 1,221 bp and has been used to 
design primers for amplification from T. cruzi DNA (see Methods). 
The final sequence, obtained after amplification and sequencing 
from CL-Brener DNA, was submitted to GenBank under the 
accession number JN050853. 

At this point, there were still two gaps present when modeling 
the T. cruzi SBP on top of the yeast pathway. These correspond to 
the enzymatic reactions catalyzed by the yeast genes ERG27 (3- 
keto sterol reductase) and ERG25 (C-4 methyl sterol oxidase). Our 
failure at identifying the orholog of the yeast ERG27 gene came as 
no surprise, as the enzymes performing C-3 ketoreduction in land 
plants and sterol synthesizing bacteria are stiU unknown [42,43]. 

However, a number of putative homologs of ERG25 in T. cruzi 
were readily identified in sequence similarity searches. In this case, 
it was difficult to discern which of these were the true orthologs of 
the yeast ERG25 gene. 

Apparent loss of ERG25 homologs in T. cruzi and T. brucei 

A BLASTP search using the yeast ERG25p as query against 
kinetoplastid proteomes retrieved 5 T. cruzi significant hits, with 4 
of them having a similar length (278-279 aa. See Table SI), 
sharing a common Pfam Domain (PF04116, Fatty Acid Hydrox- 
ylase SuperfamUy), and a common architecture with 3-A predicted 
fran^-membrane domains (the fifth gene is a truncated copy). These 
hits correspond to two pairs of alleles, and are currentiy annotated 
as 'C-5 sterol desaturases' in the T. cruzi genome database (or 
'lathosterol oxidases', both are synonymous terms). As there is 
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Table 1. Genes in the sterol biosynthesis pathway (SBP) of yeast. 





Gene ID/Locus 


Molecular Function 


EC Number 


Pathway Order 


Ortholog Group 


ERG10/YPL028W 


Acetoacetyl-CoA thiolase 


2.3.1.9 


1 


OG4_10214 


ERG13/YML126C 


3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) synthase 


2.3.3.10 


2 


OG4_11016 


HMG1/2/YML075C, YLR450W 


HMG-CoA reductase 


1.1.1.34 


3 


OG4_11458 


ERG12/YMR208W 


Mevalonate kinase 


2.7.1.36 


4 


OG4_11698 


ERG8/YMR220W 


Phosphomevalonate kinase 


2.7.4.2 


5 


0G4_1 5366 


ERG19/YNR043W 


Mevalonate pyrophosphate decarboxylase 


4.1.1.33 


6 


OG4_11688 


ERG20/YJL167W 


Farnesyl pyrophosphate synthetase 


2.5.1.10 


7 


0G4_ 11009 


IDn/YPL117C 


Isopentenyl diphosphate isomerase {IPP isomerase} 


5.3.3.2 


7' 


OG4_12197 


ERG9/YHR190W 


Farnesyl-diphosphate farnesyl transferase (squalene synthase) 


2.5.1.21 


8 


0G4_1 3084 


ERG1/YGR175C 


Squalene epoxidase 


1.14.99.7 


9 


0G4_1 3490 


ERG7/YHR072W 


Lanosterol synthase (oxidosqualene cyclase) 


5.4.99.7 


10 


OG4_11381 


ERG11/YHR007C 


Lanosterol 14-alpha-demethylase 


1.14.13.70 


11 


OG4_12975 


ERG24/YNL280C 


C-14 sterol reductase 


1.3.1.70 


12 


OG4_12018 


ERG25/YGR060W 


C-4 methyl sterol oxidase 


1.14.13.72 


13 


0G4_1 3007 


ERG26/YGL001C 


C-3 sterol dehydrogenase 


1.1.1.170 


14 


OG4_12533 


ERG27/YLR100W 


3-keto sterol reductase 


1.1.1.270 


15 


OG4_16167 


ERG6/YML008C 


Delta(24)-sterol C-methyltransferase 


2.1.1.41 


16 


OG4_13307 


ERG2/YMR202W 


C-8 sterol isomerase 


5.-.-.- 


17 


OG4_14573 


ERG3/YLR056W 


C-5 sterol desaturase (lathosterol oxidase) 


1.3.3.- 


18 


OG4_12421 


ERG5/YMR015C 


C-22 sterol desaturase 


1.14.14.- 


19 


OG4_14688 


ERG4/YGL012W 


C-24(28) sterol reductase 


1.3.1.71 


20 


OG4_16908 



The table lists the reference genes used in this work for mapping the corresponding T. cruzi genes. Information for this table was derived from the Saccharomyces 
Genome Database (SGD, yeastgenome.org). Ortholog group identifiers are from the OrthoMCL Database, version 4 (orthomcl.org). 
doi:1 0.1 371 /journal.pone.0096762.t001 



detectable similarity between ERG3 and ERG25 genes in yeast as 
well, a similar search using the yeast ERG3p as query retrieves the 
same set of T. cruzi loci. Therefore, to identify which loci correspond 
to the true orthologs of ERG25, we carried out a detailed 
phylogenetic analysis between these genes, in different organisms. 
Using the sequences of fungi, plant, insect, human, fish and 
trypanosomatid enzymes, we obtained a maximum-likelihood tree 
with a clear segregation of ERG25 and ERGS orthologs in 
separate branches (Figure 1). The tree suggests that there is one T. 
cruzi locus (containing TcCLB. 473 1 1 1.10 and its TcIII-like allele, 
TcCLB.507853.10) that is the true ortholog of the yea.st ERG3 
gene. Interestingly, the branch containing ERG25 orthologs does 
not show genes from African or American trypanosomes (the only 
kinetoplastid genes are from Leishmanias). The other trypanoso- 
matid genes identified in our BLAST searches are grouped 
together in a third branch, carrying only trypanosomatid genes. 
These results are consistent with reciprocal best-hits identified in 
BLAST searches (see Table SI). Reciprocal or bidirectional best 
hits provide support for the conjecture that the genes are 
equivalent orthologs [44,45]. In these searches, there is a clear 
drop in the BLAST score when the query sequence is ERG3 or a 
ERG3 homolog. This score drop is not observed when the query 
sequence is ERG25 or an ERG25 homolog. Apart from this 
evidence, there is additional support for this separate branch from 
the BLAST searches, in the observed reciprocity of best hits. For 
example when using the L. major gene LmjF.36.2540 as query 
(grouped with ERG25 orthologs in Fig 1) the best hits are the yeast 
ERG25 and their orthologs. Also viceversa, when doing the 
reciprocal BLAST search using the yeast ERG25p as query, the 
best L. major hit is again LmjF.36.2540, even though in this case 



the scores are lower than for those obtained for ERGS orthologs. 
These bidirectional hits are not observed for the group of 
trypanosomatid genes grouped in the middle branch in the 
phylogenetic tree. 

Taken together, the most parsimonious interpretation of these 
results is that the C-4 sterol oxidase gene has been lost in the 
ancestor of T. cruzi and T. hrucei (but not in the ancestor of 
Leishmanias). The presence of Leishmanial and Trypanosomal 
proteins grouped in the middle branch of the tree suggests that 
these group of homologs have a different ancestral origin. 
However, because C-4 demethylation is essential, we hypothesize 
that they may have a C-4 sterol oxidase activity (see Discussion). 
For these reasons we decided to call them Sterol Methyl Oxidase- 
like (SMO-like) in this work. 

Table 2 summarizes the results of our efforts to fill in the missing 
genes in the pathway, whereas Figures 2 and S show the 
reconstructed isoprenoid and sterol metabolic pathways, respec- 
tively. Once the T. cruzi SBP genes were identified, we proceeded 
to study the genetic diversity by re-sequencing these genes in a 
panel of T. cruzi strains, representative of all the extant 
evolutionary lineages of the parasite. 

Genetic diversity in the sterol biosynthesis pathway 

To obtain sequence information from the selected genes we 
decided to use a methodology based on PCR amplification 
followed by direct sequencing. Therefore, it was important to 
reduce the possibility of amplification problems generated by 
polymorphisms that could prevent the annealing of the primers. A 
number of aspects were considered to reduce these risks i) we 
decided to focus our analysis on coding sequences where possible. 
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Table 2. Genes in the Trypanosoma cruzi sterol biosynthesis pathway. 





SBP Ortholog 


T. cruzi Locus Identifiers(s) 


Current annotation 


Ortholog Group 


In KEGG? 


Gene Name 


ERG 10 


TcCLB.51 1003.60 


Hypothetical protein 


0G4_ 


.10214 


Yes 


TcACAT 


ERG 13 


TcCLB.51 1903.40, TcCLB.51 1 071 .50 


Hypothetical protein, conserved 


0G4_ 


.11016 


No 


TcHMGS 


HMG1, HMG2 


TcCLB. 5091 67.20, TcCLB.506831 .40 


3-hydroxy-3-methylglutaryl-CoA reductase, putative 


0G4_ 


.11458 


Yes 


TcHMGR 


ERG 12 


TcCLB.436521.9, TcCLB.509237.10 


Mevalonate kinase, putative 


0G4_ 


.1 1 698 


Yes 


TcMK 


ERGS 


TcCLB.508277.140, TcCLB.50791 3.20 


Phosphomevalonate kinase e-like protein, putative 


0G4_ 


.15366 


Yes 


TcPMK 


ERG 19 


TcCLB.507993.330, TcCLB.51 1 281 .40 


Diphosphomevalonate decarboxylase, putative 


0G4_ 


.1 1 688 


Yes 


TcMVD 


ERG20 


TcCLB.51 1823.70, TcCLB.508323.9 


Farnesyl pyrophosphate synthase, putative 


0G4_ 


.1 1 009 


Yes 


TcFPPS 


IDII 


TcCLB.408799.19, TcCLB.510431.10 


Isopentenyl-dlphosphate delta-isomerase, putative 


0G4_ 


.14499 


Yes 6 


TclDI 


ERG9 


TcCLB.507897.20, TcCLB.508369.20 


Farnesyl transferase, putative; squalene synthase, 
putative 


0G4_ 


.13084 


Yes 


TcSQS 


ERGl 


TcCLB.509589.20, TcCLB.503999.10 


Squalene monooxigenase, putative 


0G4_ 


.13490 


Yes 


TcSQLE 


ERG7 


TcCLB.5081 75.70, TcCLB.506825.170 


Lanosterol synthase, putative 


0G4_ 


.11381 


No 


TcOSC 


ERGll 


TcCLB.506297.260, TcCLB.51 01 01 .50 


Lanosterol M-a-demethylase, putative 


0G4_ 


.12975 


Yes 


Tc14DM, CCYP51 


ERG24 


TcCLB.507969.60, TcCLB.5071 29.30 


C-14 sterol reductase, putative 


0G4_ 


.12018 


Yes 


Tc14SR 




TcCLB.51 1339.20, TcCLB.51 1 895.69, 
TcCLB.509235.20 


C-5 sterol desaturase, putative 


0G4_ 


.20087 


No 


TcSMO-like 


ERG26 


TcCLB.51 0873. 10 


NAD(P}-dependent steroid dehydrogenase protein, 
putative 


0G4_ 


.12533 


No 


TcNDSDHL 


ERG27 










No 




ERG6 


TcCLB.5041 91.10, TcCLB.505683.1 0, 
TcCLB.510185.10 


Sterol 24-C-methyltransferase, putative 


0G4_ 


.13307 


Yes 


Tc24SMT 


ERG2 


TcCLB.51 0329.90 


C-8 sterol isomerase, putative 


0G4_ 


.14573 


Yes 


TcSSI 


ERGS 


TcCLB.4731 11.10, TcCLB.507853.10 


Lathosterol oxidase, putative 


0G4_ 


.12421 


No 


TcSCSD 


ERGS 


TcCLB.506945.190 


Cytochrome p450-llke protein, putative 


0G4_ 


.14688 


No 


TCSC22D 


ERG4 


TcCLB.506577.1 20, TcCLB.507709.90 


Sterol C-24 reductase, putative 


0G4_ 


.16908 


Yes 


TCSC24R 



The table lists T. cruzi genes mapped to either the corresponding KEGG maps, or the yeast SBPs {see Table 1). OrthoMCL Identifiers are from the OrthoMCL Database 
version 4. T. cruzi gene Identifiers are those currently available at the TriTrypDB resource. The fifth column shows whether the gene was mapped to the corresponding 
KEGG metabolic map at the time of this writing. The last column shows the nomenclature used In this work. Some of these gene names were previously used in the 
literature, and others are used here for the first time, based on the gene names of relevant orthologs. In the case of the TclDI gene, the two listed alleles are truncated 
copies of the gene. The full-length gene Is deposited In GenBank as AJ866772. While In the case of the TcNSDHL gene, the copy annotated by the genome project is 
truncated due to genome assembly problems. The full-length gene Is deposited In GenBank as JN050853 {this work). 
doi:10.1371/journal.pone.0096762.t002 



as they are generally less variable than non-coding sequences; ii) 
we used the T. cruzi SNP database (TcSNP, http:// snps.tcruzi.org) 
[46] , to select the best regions for primer design, avoiding regions 
with candidate SNPs; and iii) we also limited the size of 
amplification products, to optimize the quality of the final 
sequence (see Methods). In this latter case, an optimal size was 
set to ~ 750-800 bp, which would allow us to get complete 
sequence coverage, with good quality on both strands. Depending 
on the size of each gene, one or more overlapping amplification 
products had to be analyzed (the number of amplification products 
per gene is hsted in Table 3). 

AH amplification products were analyzed with a software 
package (PolyPhred, see Methods) that allows the identification 
of heterozygous peaks in the chromatograms. Therefore, for all 
selected genes, we identified two types of sequence polymorphisms: 
i) allelic variation within a strain/clone (heterozygous peaks 
identified by Polyphred); and ii) variation between strain/clones 
(identified in a multiple sequence alignment of sequenced 
products). For every gene, we identified the position of the variant 
sites, and the type of change introduced (synonymous or non 
synonymous). The complete information for each gene is available 
as supplementary material (Table S2), and a table summarizing 
these results is included herein (Table 3). Using this strategy, we 
generated ~24 Kb of sequence, from 16 strains covering the 6 



major T. cruzi evolutionary lineages. Overall we attained a 
coverage of ~90% of the total coding sequence for the selected 
genes. From this analysis we identified 975 polymorphic sites, 
producing 965 codon changes, which generate 692 synonymous 
changes (72%) and 273 non-synonymous changes (28%, see 
Table 3). Interestingly, we uncovered heterozygous SNPs in all 
genes of the pathway, even for those which are currentiy 
represented in the reference genome in an haploid state (e.g. 
TcACAT, and TcSC22D). The genes Tc24SMT, TcHMGR and 
TcNSDHL had the lowest SNP density (2.68, 2.74 and 2.50 every 
100 bp respectively), while TcSC5D had the highest SNP density 
(1 1.39 every 100 bp). This density is at least double that which is 
found in other SBP genes. Nevertheless, this gene is under an 
apparently high purifying selection, as suggested by its low dN/ dS 
ratio (Table 4). Upon further investigation of this gene, we noticed 
a number of informative SNPs that could be exploited in a lineage 
typing assay. Therefore, we performed a separate re-sequencing 
experiment for this gene, in an expanded panel of strains, and 
developed two alternative typing assays based on this bcus [47]. 

We next assessed the accumulation of changes at the protein 
level (missense polymorphisms), and observed that the genes 
TcSMO-like, TcMVD and TcPMK have the highest non- 
synonymous SNP density (1.82, 1.94 and 2.50 non-synonymous 
SNPs every 100 bp respectively). In spite of this, the dN/dS ratio 
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Figure 1. Phylogenetic tree of selected ERG3/ERG25 orthologs. Seleced orthologs from kinetoplastids, plants, vertebrates, invertebrates and 
fungi were aligned with t_coffee. A phylogenetic reconstruction was calculated using PhyML (LG model, bootstrap resampling with N = 1000). For 
clarity, highly similar genes/alleles were not included in the final tree. Organism abbreviations are: Ibra = L braziliensis; Imex = L mexicana; linf = L 
infantum; Imaj = L. major, tcru = T. cruzi; tviv = T. vivax; tcon = 7". congolense; tbru = T. brucei; agam = Anoptieles gambiae; aaeg = Aedes aegypti; 
cpip = Culex pipiens; hsap = Homo sapiens; calb = Candida albicans; seer = S. cerevisiae; spom = Schizosaccharomyces pombe; atha = A. ttialiana; 
drer = Danio rerio; trub = Tal<ifugu rubripes; tnig = Tetraodon nigroviridis. 
doi:1 0.1 371 /journal.pone.0096762.g001 



for these genes is still low, ranging from 0.028 (TcACAT) to 0. 122 
(TcPMK), indicating the relatively high degree of conservation of 
T. cruzi SBP genes. 

Lineage-specific and intra-lineage differences 

AH the collected data on sequence diversity was also analyzed in 
the context of the current separation of T. cruzi in discrete 
evolutionary lineages. For this analysis we considered three types 
of polymorphisms or fixed differences: i) lineage specific polymor- 
phisms (LSPs) (these are the polymorphic sites that could 
differentiate one lineage from all the others; ii) intra-lineage 
polymorphisms (ILP) (differences between strains belonging to the 
same lineage); and iii) heterozygous sites (those that differ between 
alleles of a single diploid individual). In this comparative analysis 
we found that the lineage Tcl showed more than one LSP every 
200 bp, while in the other lineages, the LSP density was at least 
four times lower (data not shown). This data agrees with the 
hypothesis that Tcl is one of the ancestral lineages [6,9]. Analyzing 
the ILP distribution, we found that the TcIV lineage presents the 
highest density of ILPs, with almost one ILP every 100 bp. 
Interestingly more than ~90% of the ILPs observed in TcIV are 
restricted to the comparison between the South American strain 
Canlll with two strains from North America (Dog Theis and 
92122102R, data not shown). The observed differences among 
TcIV strains from South and North America are in agreement 



with previous observations using different markers which suggest- 
ed a phylogenetic divergence between TcIV from these two 
regions [4,10,48-54]. Indeed, our results show that the number of 
differences between Canlll and these other TcIV strains is even 
greater than those observed between lineage TcV and TcVI (the 
two hybrid lineages). Our results therefore agree with previous 
publications that describe a considerable phylogenetic distance 
between the Canlll and Dog Theis strains, even though they are 
still placed into the same evolutionary lineage based on current 
typing methods. As expected, the hybrid lineages (TcV and TcVI) 
show the highest level of heterozygosity, with more than 3 
heterozygote sites every 200 bp; while TcII showed an interme- 
diate level of heterozygosity with approximately 3 sites every 2 Kb, 
and the other lineages showed less than 10 sites in the 24 Kb 
analyzed. 

Potentially important non-synonymous changes 

In many cases, variations in susceptibility or resistance to a drug 
are associated to mutations that occur near the interaction site of a 
substrate or an inhibitor. To analyze the potential relevance of the 
non-synonymous SNPs found, we gathered relevant information 
from the literature on the selected targets, their PFAM domains, 
PROSITE motifs, and the available crystallographic structures in 
the Protein Data Bank (PDB) for the T. cruzi enzyme or their 
orthologs (see Methods). Starting with these crystal structures, we 
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Figure 2. The T. cri/z/ isoprenoid biosynthesis pathway. The figure shows the metabolic steps leading to farnesyl-diphosphate, from acetyl- 
CoA, and the corresponding yeast and T. cruzi enzymes that catalyze these steps. Gene names used are those in Tables 1 and 2. 
doi:1 0.1 371 /journal.pone.0096762.g002 



first identified residues near the co-crystallized ligands (substrate, 
inhibitor or co-factor). For this we established a maximum distance 
limit of 7 A from each corresponding ligand. This analysis revealed 
a number of potentially important changes in the Tcl4DM 
(TcCYP51), and TcMK genes. 

In the azole target gene, Tcl4DM (TcCYPSl), we identified 
only 5 non-synonymous substitutions, none of which affect key 
residues of the enzyme (see Figure 4). However, after mapping 
these substitutions on top of the available TcCYP51 structure [55], 
we identified a number of potentially important substitutions that 
Ke just next to important residues, or are within 7A of the co- 
crystalhzed hgand. One of these is a A117S substitution that sits 
just next to a tyrosine (Y116) that is within 7 A of the ligand, and 
that has been shown to be involved in the generation of resistance 
to azoles in C. albicans and U. necator [56,57]. In trypanosomatids, 
residue 1 17 marks the start of a short helix in the structure, that is 
uniquely found in trypanosomatids [58] . Alanine at residue 11 7 is 
present in T. bmcei and was found in strains from the TcIII lineage 
in homozygosis and in heterozygosis in hybrid strains, while Serine 
at this position was found in homozygosis in strains from lineages 
Tcl, TcII, and TcIV. 

In the case of the TcMK gene, we identified 12 non- 
synonymous changes. One of tiiese is a H29Y change, located in 
a conserved region of the sequence, close to a group of residues 
that are involved in substrate binding (at <7A of the ligand 



(mevalonate) in the L. ma/or structure (LmjF.3 1.0560, PDB: 2HFU) 
[59]. Histidine is encoded by the genes from strains M5631 and 
XI 09/2, both from lineage TcIII, while tyrosine is encoded in all 
other strains, including M6241, also of lineage TcIII. The second 
possibly important change in TcMK is a G287A substitution, also 
located in a very conserved motif of the MK gene, that is part of 
the GHMP kinase family domain (Pfam PF08544). This particular 
motif, contains a number of highly conserved glycines, some of 
which are within 7 A of the ligand in the L. major MK. In this 
polymorphic position strains from lineage TcV (Sc43 and Mn) 
encode glycine and alanine in heterozygosis, while all the other 
strains encode glycine in homozygosis. 

For other targets, crystallographic structures were not available, 
and so we resorted to analyze non-synonymous substitutions in die 
context of functionally important motifs or domains. In the case of 
the squalene epoxidase gene (TcSQLE), we identified two changes 
(S306G and I307L) that stand between highly conserved proline 
residues (at 95% and 94% identity within the squalene epoxidase 
domain, PF08491). The I307L substitution is apparentiy conser- 
vative, as these are the two most frequent residues in this position 
in the Pfam domain. In contrast, the S306G substitution 
introduces a glycine that is completely absent at this position in 
the 429 sequences currently available for this protein family in the 
Pfam database. Serine is the second most frequent residue in this 
position, and is present in 14% of the sequences, all from fungi 
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Figure 3. The T. cruz/ sterol biosynthesis pathway. The figure shows the metabolic steps leading from farnesyl-diphosphate to ergosterol (in 
yeast, and in T. cruzi epimastigotes) or to different 24-alkylsterols (in T. cruzi amastigotes [64]), and the corresponding yeast and T. cruzi enzymes that 
catalyze these steps. The two methyl groups at C4 are removed in two rounds of successive C4-oxidation, C4-decarboxylation and C3-ketoreduction 
(*). Gene names used are those in Tables 1 and 2. Unknown/hypothetical assignments are shown with question marks. 
doi:1 0.1 371 /journal.pone.0096762.g003 



(glutamine is the most frequent residue, in 50% of the sequences). 
The Ser residue occurs in lineage TcII in homocygosis, and in 
lineages TcV and TcVI in heterocygosis, and its conservation in 
fungi suggests that this is the ancestral character at this position. 
Finally in the C-14 sterol reductase gene (Tcl4SR), we identified 
two consecutive substitutions (P208H, V209F) in strains 
92122102R and Dog Theis (lineage TcIV), that faU within the 
sterol reductase family signature motif (PS01017). However the 



substitutions are conservative, because they are both described by 
the motif pattern. 

The phosphomevalonate kinase of T. cruzi is not under a 
strong purifying selection 

The phosphomevalonate kinases (PMKs) of pathogenic bacte- 
ria, fungi, and trypanosomes are attractive targets for the design of 
selective inhibitors, because the same phosphorylation reaction in 
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Table 4. Comparative analysis of the genetic diversity present in the sterol biosynthesis pathway of different kinetoplastids. 





dN/dS im) 


YfiBst C|6n6r Std symbol 


Af Tryps 




Tcr 


ERGIO/ACAT 






0.028 


ERG13/HMGS 


0.057 


0.048 


0.054 


HMG1/HMGR 


0.065 


0.052 


0.057 


ERG12/MK 


0.077 


0.076 


0.077 


ERG8/PMK 


0.1 lot 


0.1 22t 


0.1 22t 


ERGig/MVD 


0.067 


0.072 


0.059 


IDI1/IDI 


0.051* 


0.043* 


0.048* 


ERG20/FPPS 


0.060 


0.061 


0.065 


ERG9/FDFT, SQS 


0.085 


0.087 


0.093 


ERGl/SQLE 


0.1 15t 


0.1 07t 


0.1 15t 


ERG7/LSS, OSC 


0.1 09t 


0.1 13t 


0.1 13t 


ERG11/14DM, CYP51 


0.042* 


0.040* 


0.039* 


ERG24/14SR, TM7SF2 


0.1 42t 


0.1 09t 


0.1 lot 


ERG25/SMO/SC4MOL 




0.271 




-/SMO-like 


0.080 


0.082 


0.092 


ERG26/NSDHL 


0.051* 


0.057 


0.058 


ERG27/3KSR - - - 


ERG6/24SMT 


0.065 


0.062 


0.070 


ERG2/8SI 


0.1 15t 


0.100 


0.097 


ERG3/SC5D 


0.069 


0.059 


0.060 


ERG5/SC22D 




0.028* 


0.107t 


ERG4/SC24R 




0.097 


0.070 


Average 


0.080 


0.075 


0.079 


Standard deviation 


0.028 


0.028 


0.026 



The table shows the dN/dS (to) ratio observed in the afrlcan trypanosome lineage (Af Tryps) (T. brucei brucei, T. brucei gambiense, T. evansi, T. congoiense, T. vivax), in the 
Leishmania lineage (Lelsh) {L. major, L. infantum, L brazitiensis, and L. mexicana), and in the T. cruzi lineage (Tcr, sequences from Tcl{TcVI DTDs). The values that are above 
(t) or under (*) a standard deviation from the average of each column are marked. 
doi:10.1371/journal.pone.0096762.t004 



humans and other animals is catalyzed by a noii-orthologous 
enzyme [60] . In the first group of organisms, phosphorylation of 
mevalonate is performed by orthologs of the yeast ERGS gene, 
while in animals it is performed by a group of orthologs of the 
human PMK gene (hPMK). These two groups of enzymes differ in 
a number of kinetic, biophysical properties, and in the ATP- 
binding motifs (ERG8-like kinases contain a protein kinase motif, 
while orthologs of the human enzyme have a P-loop or "Walker 
A" motif) [60,61]. Analysis of the TcPMK gene in different strains 
of T. cruzi showed that the gene has accumulated 68 changes (35 
synonymous, 33 non-synonymous, see Table 3), and 1 non-sense 
substitution (see below) in these independently evolving lineages. 
Based on the dN/dS ratio, the gene is under purifying selection 
(dN/dS < 1, see Table 4). However, it displays the highest dN/dS 
ratio of the pathway, the highest density of missense SNPs (2.5 
non-synonymous SNPs/ 100 bp), as well as the highest ratio of 
synonymous to non-synonymous SNPs (~1). Taken together, 
these data show that this is the gene that is apparently under more 
relaxed selection in the SBP pathway of T. cruzi 

Moreover, an interesting substitution was observed at codon 
136 (407 bp) in this gene in the T. cruzi strain IW (TcII). In this 
strain, one allele encoded a Serine (TCA), as in all other strains, 
whereas the second allele encoded a premature STOP codon 
(TAA). All ERG8-like PMKs are composed of two GHMP kinase 



domains: an N-terminal domain (Pfam PF00288) located in the 
middle of the protein (starting at residue 160 in T. cruzi), and a C- 
terminal domain (PF08544) closer to the C-termini. In T. cruzi, the 
nonsense mutation in the IW strain is located upstream of this 
first domain (PF00288). Considering that the next possible 
translational start codon is located downstream of this domain, 
the IW strain is therefore probably producing a very short non- 
functional protein from this allele, or two truncated proteins, both 
devoid of this domain. Thus, the most plausible hypothesis is that 
the IW strain carries only one functional PMK allele, and can be 
considered naturally hemizygous for the PMK gene. 

Comparative analysis of genetic diversity in kinetoplastid 
sterol biosynthesis pathways 

Although phylogeneticaUy related, kinetoplastid parasites have 
evolved different adaptations to their host environments. This is 
particularly evident in the case of ergosterol dependency: both T. 
cruzi and Leishmania have an essential requirement for ergosterol 
and/or other 24-alkylsterols, and are unable to survive by 
salvaging cholesterol from the host [62]. Whereas sterol biosyn- 
thesis in Trypanosoma brucei, is apparently suppressed in blood- 
stream forms, relying instead on receptor-mediated endocytosis of 
host low-density lipoproteins carrying cholesterol [63]. Apart from 
differences in their strict requirement for denovo synthesis of sterols. 
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TcCLB. 506297. 260 1 

Tbll.02.4080 1 

CYP51_MYCTU 1 

CP51A HUMAN 1 



MFIEAIVLGLTlLlEYSVYSVKSFNTTRPTDPPVYPVTVPFL5HIVQFGKNPLEFr?0RCKRDLKSGVFTlSIGG5RWlV 

MLLEVAIFLLTALALYSFYFVKSFNVTRPTDPPVYPVTVPILGHIIQFGKSPLGFMQECKRQLKSGIFTINIVGKRVTIV 

MSAVALPRVSGGHDEH GHLEEFRTDPIGLMQRV-RDECGDVGTFQLAGKQVVLL 

MVGKTFTYL 



TcCLB. 506297 . 260 81 

Tbll.02.4080 81 

CYP51_MYCTU 5 4 

CP51A HUMAN 10 



TcCLB. 506297. 260 158 

Tbll.02.4080 158 

CYP51_MYCTU 12 8 

CP51A HUMAN 87 



TcCLB. 506297. 260 237 

Tbll.02.4080 237 

CYP51_MYCTU 202 

CP51A HUMAN 161 



TcCLB. 506297 . 260 313 

Tbll.02.4080 313 

CYP51_MYCTU 2 80 

CP51A HUMAN 236 



GDPHEHSRFFSPRNEILSPREVYT-IMTPVFGEGVAYAA-PYPRMREQLNFL-AEELTIAKFQNFVPAIQHEVRKFMAEN 
GDPHEHSRFFLPRNEVLSPREVYS-FMVPVFGEGVAYAA-PYPRMREQLNFL-AEELTIAKFQNFVPAIQHEVRKFMAAN 

SGSHANEFFFRAGDDDLDQAKAYP-FMTPIFGEGVVFDASPE RRKEMLHNAALRGEQMKGHAATIEDQVRRMI-AD 

LGSDAAALLFNSKNEDLNAEDVYSRLTTPVFGKGVAYDV-PNPVFLEQKKML-KSGLNIAHFKQHVSIIEKETKEYF-ES 
heme support / substrate binding 

I 

WKEDEGVINLLEDCGAMIINTACQCLFGEDLRKRLNARHFAQLLSKMESSLIPAAVFMP-WLLRLPLPQSARCREARAEL 
WDKDEGEINLLEDCSTMIINTACQCLFGEDLRKRLDARRFAQLLAKMESSLIPAAVFLP-ILLKLPLPQSARCHEARTEL 

WGE-AGEIDLLDFFAELTIYTSSACLIGKKFRDQLDGR-FAKLYHELERGTDPLAYVDP-Y LPIESFRRRDEARNGL 

WGE-SGEKNVFEALSELIILTASHCLHGKEIRSQLNEK-VAQLYADLDGGFSHAAWLLPGW LPLPSFRRRDRAHRE- 

substrate recognition 

QKILGEIIVAR-EKEEASKDNNTSDLLGGLLKAVYRDGTR-MSLHEVCGMIVAAMFAGQHTSTITTSWSMLHLMHPKN-- 
QKILSEIIIAR-KEEEVNKDSSTSDLLSGLLSAVYRDGTP-MSLHEVCGMIVAAMFAGQHTSSITTTWSMLHLMHPAN — 
VALVADIMNGRIAN — PPTDKSDRDMLDVLIAVKAETGTPRFSADEITGMFISMMFAGHHTSSGTASWTLIELMRHRDAY 

IKgIFYKAIQKRRQSQEKID-piLQTLijD4T2K2GR£-LTDDEVAGMLIGLLLAGQHTSSTTSAWMGFFLARDKTLQ 

proton delivery 

■ ■■ ■ ■■ • 

»»»»l>ltlt 

KKWLDKLHKEIDBFPAQENYDNVMDEMPFAERCVRESIRRDPPLLMVMRMVKAEVKVGSYWPKGDIIACSPLLSHHDEE 
VKHLEALRKEIEEFPAQLNYNNVMDEMPFAERCARESIRRDPPLLMLMRKVMADVKVGSYWPKGDIIACSPLLSHHDEE 
AAVIDELDELYGDG-RSVSFHA-LRQIPQLENVLKETLRLHPPLIILMRVAKGEFEVQGHRIHEGDLVAASPAISNRIPE 
KgCYLEQKTVCGENLPPLTYDQ-LKDLNLLDRCIKETLRLRPPIMIMMRMARTPQTVAGYTIPPGHQVCVSPTVNQRLKD 



TcCLB. 506297. 260 393 

Tbll.02.4080 393 

CYP51_MYCTU 358 

CP51A HUMAN 315 



I * ★ -kirk ★ i^** 

AFPNPRLWDPERDEK VDGAFIGFGAGVHKCIGQKFALLQVKTILATAFREYDFQLLRDEVPDPDY HTM 

AFPEPRRWDPERDEK VEGAFIGFGAGVHKCIGQKFGLLQVKTILATAFRSYDFQLLRDEVPDPDY HTM 

DFPDPHDFVPARYEQ — PRQEDLLNRWTWI PFGAGRHRCVGAAFAIMQIKAI FSVLLRE YEFEMAQ PPESYRNDHSK 

SWVERLDFNPDRYLQDNPASGE KFAYVPFGAGRHRCIGENFAYVQIKTIWSTMLRLYEFDLIDGYFPTVNY T 



TcCLB. 506297. 260 461 

Tbll.02.4080 461 

CYP51_MYCTU 4 33 

CP51A HUMAN 387 



PS00086: lieme binding 



WGPTLNQCLVKYTRKKKLPS 481 
WGPTASQCRVKYIRRKAAAA 481 
MWQLAQPACVRYRRRTGV-- 451 
TMIHTPENPVIRYKRRSK 404 



I Polymorphic sites in T. cruzi identified in this study (non-synonymous). 
• Sites associated with resistance to azoles in C. albicans and U. necator. 
■k Residues within 7A of the ligand in TcCYPSI (PDB accessions: 3K1 0, 3KHM and 3KSW). 
▼ Residue that determines the unique substrate specificity of TcCYP51 (Lepesheva Gl ef ai. 2005). 
Conserved residues in CYP51 (Lepesheva Gl and Waterman MR 2011). 



Figure 4. Alignment of T. cruzi, T. brucei, M. tuberculosis anA human Lanosteroi 1 4-<ir demethyiases, showing the non-synonymous 
changes identified in this wori< (red arrows). Important residues either in Tc14DM or in the CYP51 family are noted [100,101], as well as residues 
associated with resistance to azoles in C. albicans and U. necator [56,57]. PS00086 is the Prosite Cytochrome 450 motif (cysteine heme-iron ligand 
signature). 

doi:1 0.1 371/journal.pone.0096762.g004 



there are also difFerences in the exact type and abundance of 
synthesized sterols, which may be explained by differences in the 
complement of genes and/ or their activities or regulation. As an 
example, it has been described that T. cruzi amastigotes lack A ' 
sterols, suggesting a lack of A' desaturase activity in this life-cycle 
stage [62,64]. Based on these premises, we decided to investigate 
the accumulation of nucleotide changes (fixed differences) in genes 
from the sterol biosynthesis pathway in African Trypanosomes, 
and Leishmanias, reasoning that the selection acting on these 
genes could be different in each case. For this we identified the 
corresponding orthologs of the yeast and T. cruzi genes used in this 
work (see Table 2) in currendy available kinetoplastid genomes 
(see Methods). The available information includes that of T. brucei 
brucei (2 strains), T. brucei gambiense, Trypanosoma evansi, Trypanosoma 
congolense and Trypanosoma vivax; and that of 4 subspecies of 
Leishmania (major, infantum, braziliensis and mexicana). Using this 
information we proceeded to analyze the nucleotide substitutions 
observed in each group. A summary of this analysis is presented in 
Table 4 (and in Table S3). Apart from the absence of some genes 
that were already discussed, other notable missing genes were the 
orthologs of the ERG 10 (ACAT/thiolase) in African Trypano- 
somes and Leishmanias, and the ERGS (SC22D) and ERG4 
(SC24R) in T. brucei. The reaction catalyzed by the orthologs of 
ERG 10 in other trypanosomatids, is catalyzed in T. brucei by 



another thiolase [15] (Tb927.8.2540, ortholog of POTl, see Table 
S4). However, whereas the absence of the sterol reductase has 
been noticed by others recently [65], we also noticed the absence 
of a gene encoding the C22 sterol desaturase in this organism. 

Because the genetic distances within each phylogenetic group 
are different, it would be perhaps incorrect to compare nucleotide 
changes for each gene across these groups of organisms in a direct 
way. However, it is stiU possible to analyze and compare the 
information on sequence diversity within each group (e.g. column- 
wise in Table 4). In the table, we marked the genes with the 
highest and lowest dN/dS ratios within the pathway, and those 
deviating > 1 standard deviation. When looking at data in this 
way, a number of observations can be made: the less and most 
diverse genes in each pathway correlate very well: PMK, SQLE, 
OSC and 14SR belong to the most diverse genes of the pathway, 
while IDI 1 and 1 4DM are the most conserved genes in all species. 
The only exceptions are the NSDHL and SSI, which belong, 
respectively, to the less and most conserved genes only in African 
trypanosomes, and SC22D, which belongs to the group of highly 
conserved genes in Leishmania but to the group of less conserved 
genes in T. cruzi. Altogether, these data show that although there is 
a general high apparent purifying selection over the SBP of 
trypanosomatids (all dN/dS values are <<1), there are differences 
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in the restrictions imposed to diversification in each group of 
organisms. 

Discussion 

The main goal of our work was to study the genetic diversity 
present in the isoprenoid and sterol biosynthesis pathway (SBP) of 
T. cruzi. These pathways are validated chemotherapeutic targets 
for Chagas Disease, with a number of compounds currendy 
undergoing clinical trials. Because the T. cruzi SBP pathway 
appeared to be incomplete in metaboUc pathway databases such as 
KEGG when we started this work, and because the annotation of 
the SBP genes was also incomplete, we had to perform a small- 
scale bioinformatics analysis to fill in the gaps in available 
sequence and annotation. This task was performed primarily 
based on the well studied S. cerevisiae ergosterol biosynthesis 
pathway. As a result of this strategy, the majority of the genes of 
the pathway have now been identified, with the exception of the 
orthologs of the yeast gene ERG27, which encodes a 3-keto sterol 
reductase. As mentioned, the gene(s) responsible for this activity in 
land plants and sterol synthesizing bacteria have not been 
identified yet [43]. It is therefore highly likely that the 
trypanosomatid 3-keto sterol reductase is phylogenetically closer 
to the plant enzymes, and that once this elusive gene is identified it 
will be readily identified in trypanosomatids. 

We selected 21 genes from this pathway to build a genetic 
diversity profile from representative strains of the six major 
evolutionary lineages of T. cruzi. For this analysis we used at least 2 
strains for each evolutionary lineage therefore effectively sampling 
a large genetic space. Although it is certainly likely that other SNPs 
or fixed differences can be discovered when sequencing from new 
isolates, most probably these new mutations wiU correspond to 
changes that are unique to the new isolate (e.g. introduced during 
the clonal expansion of this particular isolate). In our experience, 
when expanding our re-sequencing analysis of the TcSC5D gene 
with 13 additional strains from lineages TcI-TcVI, only 6 new 
polymorphic sites were identified (see Fig SI in [47]). Considering 
that this is the gene with the highest density of SNPs in the panel 
(82 polymorphic sites found in the present analysis), by doubhng 
the number of strains sampled we only obtained a mild increase of 
sequence diversity information. Thus, we consider that in the 
current study we have covered a significant amount of the genetic 
diversity space of these pathways. 

The strategy employed consisted in the generation of overlap- 
ping amplification products of approximately 750 bp (with 
~100 bp of overlap) for each gene, followed by direct sequencing 
of amplification products in both strands. As a result, the majority 
of the sequenced bases were read at least twice, with coverage in 
both strands. The primers were designed based on the CL-Brener 
genome sequence, and the majority of them were designed against 
the corresponding coding sequence to reduce the possibility of 
amplification problems (under the hypothesis that coding sequenc- 
es are much less polymorphic than non-coding sequences). 
Moreover, when designing primers we avoided SNPs already 
identified from sequences in the public domain by checking 
against the TcSNP database. This strategy enabled the amplifi- 
cation and sequencing of all the selected gene fragments in strains 
from all the lineages, except for the first amplification product of 
the TcMK gene, that could not be amplified initially from lineage 
I. These strains carry a SNP that is specific f )r Tcl and that 
mapped exactiy at the 3'-end of the forward primer (see Table S5). 
We frxed this primer after recent genomic data was made available 
for a number of Tcruzi I strains (Sylvio XIO [66], JR cl4 (accessed 



trough TriTrypDB [67]), and TcAdriana (Westergaard G and 
Vazquez M, unpublished results). 

Our results show that the TcSC5D gene is the most 
polymorphic gene in the pathway, with 1 1 SNPs per 1 00 bp (for 
a total of 82 SNPs in 720 bp analyzed). This SNP density is more 
than twofold larger than those observed for other re-sequenced 
genes. Despite this, the gene is under strong purification pressure, 
as judged by the ratio of missense and sense SNPs (or its dN/dS 
ratio). The TcI4DM and TcIDIl genes had the lowest dN/dS 
ratios in the panel. Interestingly, the orthologs in Leishmanias and 
African trypanosomes also display the lowest ratios in each case 
(See Table 4), highlighting the level of apparent selection pressure 
for these enzymes in trypanosomatids. Furthermore the Tcl4DM, 
TcHMGS and TcHMGR genes have the lowest density of 
missense SNPs in the panel, with only 0.51, 0.43 and 0.56 non 
synonymous SNPs every 100 bp respectively, so at least from these 
genetic evidence alone, these genes would appear to be the best 
candidates for drug development in T. cruzi. 

The SMO-like (or SC5D-like) genes of T. cruzi and 
Leishmania 

As described above, when looking for the T. cruzi orthologs of 
the ERG25 (C4-methyl oxidase) gene, we identified two homologs 
(two pairs of allelic variants from the hybrid genome), which are 
members of the Fatty Acid Hydroxylase superfamUy (Pfam 
Domain PF04116). Based on best-reciprocal BLAST hits and 
careful examination we concluded that these genes 
(TcCLB.5 11339.20, TcCLB.509235.20) are not orthologous to 
ERG25, and are probably divergent homologs of a difiFerent 
ancestral gene. Our phylogenetic reconstruction suggests that 
Leishmanias have retained the ancestral ortholog of ERG25, 
although the apparent selection acting on this gene is more relaxed 
than that observed for other genes in the pathway (see Table 4). In 
any case, based on this analysis it is tempting to speculate that the 
group of genes we call SMO-like, which are present in all 
trypanosomatid species analyzed in this work, are the ones that are 
responsible for the essential C4-demethylation step. 

Interestingly, when performing BLASTP searches of SMO-like 
proteins against fungal genomes, we noticed a number of SUR2 
(Sphinganine C4-hydroxylase) homologs among the significant hits 
(see Table SI). In yeast, SUR2 (also a member of the FA 
hydroxylase superfamily) catalyzes the conversion of sphinganine 
to phytosphingosine in sphingolipid biosynthesis. Recentiy, the 
presence of phytosphingosine in trypanosomes was demonstrated 
by mass spectrometry [68], and the authors also show that the 
biosynthesis of phytosphingosine is driven by bi-functional 
hydroxylase/desaturase enzymes. However, the trypanosomatid 
genes they identified as responsible for this activity are not 
orthologs of yeast SUR2 (see Figure 2 in Vacchina et al. [68]). In 
our reciprocal BLAST searches using the yeast SUR2 protein as 
query against trypanosomatid genomes, we always retrieve the 
same set of ERG3 orthologs, and SMO-like genes. Overall these 
data suggest that in trypanosomes the orthologs of both SUR2 and 
ERG25 have been lost, and that the SMO-like genes grouped in 
the middle branch in Figure 1 could represent an ancestral 
hydroxylase/desaturase that has adjusted (or is stiU adjusting) to a 
new functional niche in these organisms (cellular localization or 
time/stage of expression, etc). Interestingly the expression of this 
gene is higher in amastigotes and trypomastigotes, the two life 
cycle stages that occur in the mammalian host (data from Minning 
et al. [69]). 

The protein sequences encoded by these genes show the three 
canonical conserved histidine boxes (HxxxH, HxxHH, and 
HxxHH) present in all fatty acid hydroxylase family members. 
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The distribution of the accumulated changes is shown in Figure 
SI. None of the non-synonymous changes affect these highly 
conserved motifs, and at least a third of these (depending on the 
membrane topology prediction) are predicted to be exposed (not 
embedded in the membrane). This is important because, as 
reviewed in [70], the evolution of new regioselectivities in these 
enzymes would not involve the active site, but adjacent sequences. 
However, the failure to predict a reasonable topology (see Fig SI) 
points to the need to do an in-depth study of the membrane 
topology of this protein. 

The use of dN/dS ratios 

Although appealing because it normalizes substitutions over 
silent vs non-silent sites, the dN/dS ratio is extremely sensitive to 
violation of a number of assumptions, such as the need to use data 
from divergent lineages, and is also affected by differences in the 
time since divergence and/or effective population sizes of 
compared species, among others [71]. This has been noticed in 
many studies, particularly in the case of bacterial species (even 
those with highly clonal populations) [72]. As shown by these and 
other authors, the dN/ dS ratio is sometimes unstable, displaying a 
time dependency that makes it difficult to compare genomes at 
differing levels of divergence [73]. 

In preliminary studies with genes from these pathways, we came 
across a number of puzzling observations: some highly conserved 
genes in T. brucei showed unusually high (>>1) dN/dS values. 
This was due to the restricted number of genomes analyzed (3), 
which resulted in a low SNP frequency. Therefore the dN/ dS ratio 
calculation did not provide enough depth for a reliable analysis as 
most genes had very few (or none) synonymous substitutions. This 
effect disappeared when adding sequences from other african 
trypanosomes (more distant lineages), which prompted us to 
investigate dN/dS trajectories. 

As shown in the Figure S2, for pairwise comparisons where the 
number of polymorphisms is low, the dN/ dS ratios are unstable 
and tend to display large values particularly for comparisons 
between highly conserved pairs of sequences. This should not be 
interpreted as an apparent diversifying selection, as these dN/dS 
values tend to (paradoxically) stabihze when more divergent 
sequences are compared. These results are in line with previous 
observations, and point to the use of care when analyzing dN/dS 
values. As shown by others [71-73] the use of dN/dS should be 
restricted to sequences from divergent hneages (with no sexual 
exchange among them). However, in the case of trypanosomatids 
this is difficult to enforce. Both within the T. cruzi and Leishmania 
lineages there are natural hybrids [7,8,49,74,75], which suggests 
that, although infrequent, genetic exchange have occurred in the 
wild. Therefore, the possibility of recombination between the 
parental alleles in these hybrids cannot be excluded. This is also 
the case for T. hrucei, which has an extant capacity for both clonal 
and sexual propagation with varying degrees of inbreeding or out- 
crossing, with T. hrucei gambiense showing a strict clonality [76], 
whereas different subpopulations of T. b. rhodesiense show both 
clonality and epidemic or close to panmictic behavior [77]. 

For all these reasons we believe that extreme care should be 
used when interpreting the meaning of isolated dN/ dS values (e.g. 
in Table 4), or the meaning of comparisons of dN/ dS across taxa 
in the absence of the expected trajectories of dN/ dS over time 
(because of the critical effect of time since divergence of the 
Uneages). This makes it difHctdt to answer simple questions such as 
whether the C-3 sterol dehydrogenase gene (ERG26 ortholog) is 
under the same apparent selection in T. cruzi, the African 
trypanosomes and Leishmanias. The Table 4 shows that the 
dN/dS values for this gene are similar in aU cases (0.058 in T. 



cruzi, 0.057 in Leishmania, and 0.051 in African tryps). However, 
these genes differ wildly in their SNP densities (see Table S3), 
which makes it diSicult to provide a simple answer. 

Conclusions 

In this study we have explored the genetic diversity present in an 
important pathway of T. cruzi, which is the current target of a 
number of drugs undergoing clinical trials. Our analysis show that 
the targets of current drugs are highly conserved across all 
evolutionary lineages. We have also filled a number of holes in the 
pathway by completing the sequences of a number of genes that 
were missing or truncated in the current reference genome. Finally 
by comparing genes across other important trypanosomatids, we 
show than in spite of differences in diversity, all trypanosomes 
show a mostiy conserved set of enzymes. 

Materials and Metliods 

T. cruzi stocks and strains 

Strains used in this study (and the corresponding current lineage 
classification) were: Sylvio XIO cll and Dm28 (Tel); MASl cll, 
TU18 cl93, IW cl4 (TcII); M6241 cl6, M5631 cl5 and X109/2 
(TcIII); Canlll cll. Dog Theis and 92122102R (TcIV); Sc43 cl9 
and MN cl2 (TcV); Tulahuen cl2, CL Brener and P63 cll (TcVI). 

Oligonucleotides and gene identifiers 

For each selected gene, a number of primers were designed for 
PCR-amplification. Using information from the TcSNP resource, 
we selected primers to avoid known polymorphic bases. Taking 
into account that in the direct Sanger sequencing of PGR products 
the chromatogram quality is optimal in the range from 50 to 700 
bp, a desirable length of the amplification products was set around 
750 bp. This length would also maximize our ability to sequence 
both strands of the amplification product, with good quality. 
Depending on the size of each selected gene, one or more 
overlapping amplification products were obtained. The list of the 
designed primers for each gene and the size of the corresponding 
amplification product is shown in Table S5. In only one case we 
had to design a separate primer (Tc-Mev-kinase26-TcI-fw) to 
amplify a fragment in one lineage (Tcl) because there was a SNP 
at the 3' end of the primer that was absent in the release 1 of the 
TcSNP database. 

Amplification and sequencing 

Selected fragments were amplified by PGR using Taq 
polymerase (Invitrogen) in a Biometra T Professional Gradient 
96 cycler. Amplification mixtures contained 10 pmol of each 
primer, PGR buffer (Invitrogen), 1 .5 mM MgGl^, 50 ng of 
genomic DNA, 200 |xM dNTPs, 2.5 U Taq polymerase (Invitro- 
gen), and water to a final volume of 25 [ll. After denaturing at 
94°G for 2 minutes, thermal cycling was performed for 35 (ycles at 
94°G for 30 seconds, followed by 30 seconds at a temperature set 
to 5°C less than the melting temperature of the selected primers, 
followed by 72°C for 30 seconds. Reactions were finished by a 5 
minute incubation at 72°G. Amplification products were checked 
in 1.2% agarose gels stained with ethidium bromide to verify the 
presence of a single amplification product. Next, an aliquot (10 |il) 
of the amplification reaction was treated with 1 U of Exonuclease 
I (Fermentas) and 10 U of Shrimp AlkaUne Phosphatase 
(Fermentas) for 45 minutes at 37°G and then for 30 minutes at 
80°G to inactivate these enzymes. Subsequently two sequencing 
reactions were prepared, each with one of the primers used for the 
amplification of the product. Sequencing was carried out in an 
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Applied Biosystems 3130 capillary sequencer using a Big-Dye 
terminator cycle sequencing kit, according to the instructions of 
the manufacturer. 

SNP identification and scoring 

Gene fragments were PCR-amplified from every strain of the 
panel and sequenced in both strands. Base calling of chromato- 

grams, assembly of sequences, detection of polymorphisms and 
manual inspection of assembled sequences and polymorphisms 
was done using a software package composed of Phred (version 
0.020425.C) [78,79], Phrap (version 0.9909329), Polyphred (ver- 
sion 5.04) [80] and Consed (version 15.0) [81]. BasecaUing of 
chromatograms was done by Phred. Sequences were then 
assembled by Phrap. Polyphred was used to process phrap 
assemblies to detect polymorphic sites. All candidate SNPs 
identified by PolyPhred (score > 70/99), including heterozygous 
peaks, were visualized with Consed. A few false positives, and false 
negatives were removed/ added after this manual inspection. 

Other sequences used in this study 

For the comparative analysis of SBP genes in kinetoplastid 
genomes, we have obtained the corresponding protein sequences 
for each of the yeast and/ or T. cruzi orthologs used in this study 
from the TriTr)'pDB database [67]. A BLAST search using the 
corresponding SBP gene as query was used to identify the 
corresponding ortholog. This has been cross-ciK-ckc-d by inspection 
of ortholog clusters at the OrthoMCL database [41]. The 
sequences used belong to the following species/strains: T. brucd 
brucei strains TREU927/4 [82] and Lister 427 (George Cross, 
unpublished), Trypanosoma hrucei gamhiense [83]; Trypanosoma con- 
golense and Trypanosoma vivax [84 1; Trjpano.mina n'linsi (STIB 805) 
(unpublished); L. major [85], L. infantum [86], L. braziliensis [86], and 
L. mexicana [87]. 

Alignment and calculation of dN/dS 

For calculation of cIN/ dS ratios (co) nucleotide sequences were 
first trimmed so that they all start and end at the corresponding 
START and STOP codons, and/or they have a similar length that 
is also a multiple of 3. Nucleotide alignments for each gene were 
then obtained using TranslatorX, which produces a codon 
alignment guided by aminoacid information [88]. Using these 
alignments as input, together with an unrooted phylogenetic tree 
with 3 branches containing T. cruzi lineages, Leishmania species 
and African trypanosome species, we ran the codeml program 
from the PAML package [89] with parameters "model = 0; 
NSSites = 0" to obtain optimized lengths for these branches by 
maximum likelihood. Finally, using these optimized branch 
lengths, we calculated the dN/dS ratio (oj) for each of the 3 
branches by running again codeml with the following parame- 
ters:"model = 2; NSSites = 0". 

Mapping substitutions on three dimensional structures 

CrystaUographic structures of enzymes of the sterol biosynthetic 
pathway were obtained from the Protein Data Bank (PDB, http:// 
www.rcsb.org). Using the molecular graphics viewer VMD 
(version 2.8.6) [90] together with the multiple sequence alignment 
plugin for VMD [91] we mapped sequences from different strains 
on top of the reference sequence from which the structure was 
obtained. Atomic distances from each residue to the co-crystaUized 
Ugand were measured using standard tools implemented in VMD. 
Structures used in this work (and their source organism, and T. 
cruzi homolog) were: lwl5 (human, TcACAT) [92]; 2fa3 (Brassica 
juncea, TcHMGS) [93]; 3bgl (human, TcHMGR) [94]; 2hfu {L. 



major, TcMK) [59]; 2hke {T. brucd, TcMVD)[95]; lyhm (T. cruzi, 
TcFPPS) [96]; lezf (human, TcSQS) [97]; lw6k (human, TcOSC) 
[98]; 3khm, 3klo, 3ksw (7: cruzi, TcCYP51) [99]. 

Data deposition 

The sequences reported in this paper have been deposited in the 
GenBank database under the following accession numbers: 
JN050313-JN050853, HQ586972-HQ586973, and KF290395- 
KF290460. Heterozygous sequence polymorphisms have been 
submitted as ambiguities in the sequences using standard lUPAC 
notation. Sequence polymorphisms identified between different 
strains/ clones are available as supplementary material, and will be 
also available in a future release of the TcSNP database (http:// 
snps.tcruzi.org, [46]). 

Supporting Information 

Figure SI Distribution of observed SNPs in the TcSMO- 
like genes of T. cruzi. Based on the prediction of trans- 
membrane spanning domains (see TMHMM probability plot at 

the bottom), we created two alternative representations, following 
[70]. The distribution of synonymous and non-synonymous SNPs 
is shown according to these models. The representations differ in 
the presence/ absence of the second (non-predicted) trans-mem- 
brane domain. In these two representations the location of the 3rd 
histidine box always lies on the opposite side of the membrane. 
Both topologies may be wrong and an in-depth study may be 
required to establish the correct topology of these proteins. 
(PDF) 

Figure S2 Stability of the dN/ dS ratio for the trypano- 
somatid isoprenoid and ergosterol biosynthesis genes. 

The dN/dS ratio was calculated for aU possible pairwise 
comparisons of each gene within each phylogenetic group/branch 
(e.g. the genes for aU lanosterol demethylases were aligned with 
each other within the T. cruzi group, the Leishmania group and the 
T. Aram'/ African trypanosomes group). The plot therefore 
summarizes data from all genes in the pathways, and for all 
species analyzed. The dN/ dS values were then sorted, grouped in 
bins of 10 and plotted. The error bars show the standard deviation 
within each bin. 
(PDF) 

Table SI Reciprocal BLASTP searches between fungal 
and kinetoplastid ERG25/ERG3 homologs. The file 

contains a summary of BLASTP searches against fungal and 
kinetoplastid protein databases. BLAST searches using the yeast 
ERG3/ERG25 protein sequences as query, were run at the 
TriTrypDB BLAST server against a database of Kinetoplastid 
proteomes from reference and draft genomes. BLAST searches 
using a number of putative T. cruzi orthologs of these yeast genes 
were run at the SGD Fungal BLAST Server, against a database 
containing a selection of fungal genomes. Each BLASTP search is 
shown in a separate tab in the Excel workbook. 
PCLS) 

Table S2 List of nucleotide changes (SNPs, fixed differ- 
ences) identified for each gene analyzed. The excel file 
contains one spreadsheet per gene with information on the location 
of each SNP relative to the start codon, the PolyPhred score for the 
SNP, and the character state of the SNP in each strain/lineage. 
PCLS) 

Table S3 Comparative genetic diversity of kinetoplastid 
SBP genes. 

PCLS) 
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Table S4 Presence of two types of thiolases in T. cruzi. 

The file contains the summary of results from reciprocal BLASTP 
searches between fungal and kinetoplastid ERGIO/POTI homo- 
logs. 
(XLS) 

Table S5 List of oligonucleotide primers and amplifi- 
cation products analyzed in this study. The table lists the 
oligonucleotide primers used to amplify each PGR product, the 
corresponding product size, and the number of products per gene/ 
locus. 
(XLS) 
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